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semantic  network  model  and  a  distributed  control 
structure  to  accomplish  the  image  analysis  process. 

The  process  of  "understanding  an  image"  leads 
to  the  instantiation  of  a  subset  of  the  model*  and 
the  identification  of  nodes  in  the  instance  of  the 
model  with  image  features.  The  instantiated 
nodes  and  the  relations  between  them 
form  another  data  structure  called  the  sketchmap. 
The  sketchmap  explicates  the  relation  of  the 
model  to  the  imngf£«tbis  model-image  mapping  is 
accomplished  by  ‘mapping  procedures  which  are  part 
of  the  procedural  Knowledge  in  the  model. 

The  procedures  are  accompanied  by  descrip¬ 
tions  which  contain  at  least  pre-  and  post¬ 
conditions  for  the  procedure  and  performance 
measures  for  it.  Modes  which  have  attached  pro¬ 
cedures  may  also  have  an  executive  procedure 
attached.  This  executive  Is  responsible  for 
deciding  which  of  several  possibly  effective 
procedures  to  run.  Thus  through  the  executive  the 
system  does  a  very  general  kind  of  procedure  in¬ 
vocation  based  not  only  on  what  the  executive 
knows  about  global  state,  but  on  a  rich  descrip¬ 
tion  of  the  procedure's  capabilities. 


The  user's  program  is  generally  responsible 
for  allocating  effort  at  a  level  above  that  of  the 
individual  executive  procedure.  Thus  no  single 
domain-independent  formulation  or  methodology  is 
imposed  on  all  vision  tasks.  One  facility  provided 
by  the  system  is  the  use  of  geometric  constraints 
between  model  objects  to  guide  search  for  the 
objects  in  the  image.  _ 

""The  system  is  an  attempt  to  bring  together 
many  current  ideas  in  artificial  intelligence  and 
via  ion  programming  and  thereby  to  cast  some  light 
on  fundamental  problems  of  computer  perception. 

The  semantic  network  facilitates  the  interplay  be¬ 
tween  geometric  and  other  relational  constraints 
which  are  used  to  direct  and  limit  search. ^Ihe  use 
of  attached  procedures  in  the  network  gives  a  mix 
of  declarative  and  procedural  knowledge,  and  the 
executive  provides  an  unusually  powerful  procedure 
invocation  scheme.  The  multiplicity  of  procedures 
allows  modelling  objects  under  radically  different 
conditions  and  levels  of  detail.  This  tends  to 
make  the  system  robust  in  that  an  object  which 


could  not  b«  located  Initially  nay  be  found  later 
when  knowledge  about  the  image  baa  increased. 

The  system  is  illustrated  throughout  the 
chapter  with  illustrations  from  two  particular 
applications:  the  finding  of  ships  in  a  dock  scene 
and  the  finding  of  ribs  in  a  chest  X-ray  film. 


1.  Knowledge-Directed. Image  Analysis 

‘  *h 

Inage  analysis  is  the  process  of  attaching 
neyv^ng  to  an  image.  One  way  to  do  this  is  to  use 
an  explicit  model  of  what  the  image  can  contain , 
then  construct  a  mapping  between  the  model  and 
the  image.  Since  a  particular  image  is  only  an 
instance  of  the  class  of  images  that  the  model  was 
intended  for*  the  model  must  be  in  some  sense 
"larger"  than  the  image.  That  is,  a  useful  model 
of  the  of  an  image  will  typically  contain 

a  large  amount  of  information  on  possible  image 
content.  However,  the  napping  generated  by  the 
analysis  process  (if  we  hare  a  good  model),  will 
typically  use  only  a  small  portion  of  this  model. 

An  even  smaller  portion  of  the  model  is  used 
in  specialized  analyses,  for  example,  analysis 
performed  to  look  for  particular  objects  in  the 
image.  These  processes  typically  require  that  only 
a  small  portion  of  the  image  be  mapped  onto  a 
small  relevant  part  of  the  model  that  explains 
that  image.  For  example,  in  analyzing  a  radiograph 
for  pneumoconiosis,  we  might  use  only  a  small  por¬ 
tion  of  a  radiograph  model.  He  term  a  task  which 
instantiates  a  subset  of  the  model  a  query ,  to 
emphasize  that  only  a  portion  of  a  large  possible 
mapping  is  generated,  and  that  we  expect  the  work¬ 
ing  environment  to  be  one  consisting  of  a  sequence 
of  such  tasks.  Examples  of  a  sequence  of  queries 
would  be: 

.  -  returning  to  an  aerial  photo  on 
different  days  to  perform  different 

tasks; 

-  different  physicians  requesting 
different  differential  diagnoses 
of  a  radiograph; 

-  generating  different  land  use  maps 
for  agricultural  and  social  scien¬ 
tists  from  ERTS  data. 


Given  this  approach  to  image  analysis ,  we  have 
defined  a  representation  that  allows  for  extensions 
of  partial  mappings  which  may  be  known  a  priori  or 
acquired  sequentially.  Additionally,  we  have  a  way 
of  defining  quantitatively  when  the  query  has  been 
satisfied  so  that  we  do  not  perform  unnecessary 
mappings  (e.g. ,  the  system  may  or  may  not  need  to 
know  where  all  parts  of  an  object  are  if  it  has 
"found"  the  object  at  some  coarse  level  of  detail). 
Thus  the  concept  of  query  is  central  to  our 
approach  to  image'  analysis.  Given  a  richly 
descriptive  image  model,  our  objective  is  to  code 
a  query  to  require  mapping  a  minimum  of  model 
structure  onto  the  image. 

One  of  the  problems  in  generating  the  mapping 
to  satisfy  a  query  is  that  the  mapping  is  between 
very  different  structures.  The  natural  elements  of 
the  model  are  objects  which  are  represented  sym¬ 
bolically,  whereas  the  natural  elements  of  the 
image  are  "picture  elements,"  or  pixels.  To  bridge 
this  gap,  we  have  structured  our  vision  system  in 
layers  as  shown  in  Figure  1.  At  the  most  abstract 
end  of  the  structure  is  a  semantic  network  repre¬ 
senting  our  model.  The  model  contains  generic 
information  encoded  as  idealized  prototypes  of 
structures  from  low  level  (such  as  edges)  to  high 
level  (such  as  complex  assemblages  of  objects  in 
the  world). 

In  the  middle  we  have  a  sketchmap.  This  is  a 
data  structure  that  is  synthesized  during  image 
analysis  and  provides  associations  between  the 
model  and  the  image,  that  is,  the  synthesized 
sketchmap  is  a  network  of  nodes  which  turn  out  to  be 
instantiations  of  a  subset  of  model  nodes.  The 
need  to  differentiate  between  generic  objects  and 
instantiations  of  generic  objects  has  been  recog¬ 
nized  in  natural  language  understanding  work,  e.g. 
[Hayes,  1977].  We  believe  that  it  is  also 
necessary  for  image  analysis.  Besides  associations 
with  the  model,  sketchmap  nodes  contain  associa¬ 
tions  with  image  structures,  such  as  edges  and 
regions,  specific  to  the  particular  image  being 
analyzed. 


At  the  other  end  of  the  vision  system  is  a 
third  structure  termed  the  image  data  structure. 
This  consists  of  the  original  image  at  different 
magnifications,  spectra,  resolutions,  etc. ,  to¬ 
gether  with  various  filtered  versions  of  the  image, 
texture  images,  edge  images,  etc.  The  parameters 
for  generating  all  the  image  data  structures  are 
typically  specialized  to  a  particular  model-image 
mapping  context.  ^ 

This  multi-layer^  structure  is  reminiscent 
of  the  VISIONS  System  CHanson  and  Riseman,  1977]. 
Both  systems  are  designed  to  take  advantage  of 
variable  resolution,  both  have  a  knowledge  base  or 
model  of  the  world,  a  subset  of  which  is  mapped 
into  the  image.  Perhaps  the  main  difference  is 
that  in  VISIONS,  segmentation  of  the  image  into 
edges,  regions,  etc.,  is  made  to  a  level  deter¬ 
mined  by  the  model  so  that  the  image  will  be 
understood  to  the  fullest  possible  extent,  given 
the  knowledge  in  the  model.  In  the  system  described 
here,  the  user's  query  is  responsible  for  the  level 
of  detail  the  system  pursues. 

A  successful  analysis  of  an  image  contains 
two  parts:  the  generation  of  the  proper  links  be¬ 
tween  sketehmsp  nodes  and  image  data  structures 
and  the  generation  of  the  proper  links  between 
sketchmap  nodes  and  model  nodes.  We  describe  a 
procedure  that  generates  the  image-sketchmap  link 
as  a  napping  procedure.  A  mapping  procedure  is  a 
low-level  procedure  which  is  attached  to  a  parti¬ 
cular  node  in  the  sense  of  [Bobrow,  1977]  and 
whose  function  is  to  refine  the  description  of 
that  node.  The  need  for  these  kinds  of  procedures 
la  image  analysis  has  also  been  recognized  by 
[Slornsn,  1977].  The  construction  of  links  between 
the  relational  world  model  and  the  sketchmap  is 
one  of  the  functions  of  the  executive  procedure. 
The  executive  procedure  embodies  the  overall  stra¬ 
tegy  for  achieving  the  goal(s)  and  is  programmed 
in  a  high-level  language  (currently  SAIL). 


Image  D*ta  Sketchmap  Modal 

Structures 


Figure  1.  Basie  Layer  Structure 


Since  the  best  procedures  for  finding  struc¬ 
tures  in  real-world  images  are  special-purpose, 
we  have  avoided  imposing  a  uniform  problem-solving 
regime  on  the  mapping  procedures;  instead  they 
must  he  coded  espimially  to  take  advantage  of  the 
user's  specialised  knowledge  of  the  domain.  How¬ 
ever,  some  program  autonomy  is  allowed  in  the 
choice  of  napping  procedures.  Where  different  map¬ 
ping  procedures  can  generate  the  same  links  be¬ 
tween  skstchmap  nodes  and  image  data  structures, 
the  executive  procedure  can  select  the  most 
appropriate  based  on  a  description  attached  to 
each  mapping  procedure.  Also,  the  executive  proce¬ 
dure  can  use  general  geometric  relational  con¬ 
straints  to  pin  down  the  location  of  objects. 
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The  Structure  of  the  Model 


The  Concept  of  Template  Nodes 


The  model  holds  different  kinds  of  knowledge 
about  the  image  domain.  It  includes  a  relational 
network  of  nodes  which  are  identifiable  with 
(primitive  and  complex)  objects  and  concepts  in 
the  domain  Aram  which  the  scene  is  taken.  The 
answer  to  a  query  is  a  synthesized  sketchmap  or 
description;  this  is  an  instantiation  of  a  subset 
of  the  model.  The  model*  therefore,  contains  know¬ 
ledge  in  the  form  of  al^  potential  instantiate 
descriptions.  An  example  of  this  kind  of  knowledge 
is  the  assertion: 

"the  sternum  is  above  the  heart." 

This  can  be  readily  included  in  the  model  network. 
It  is  potentially  part  of  the  model-image  mapping 
("the  sternum"  and  "the  heart"  could  be  instanti¬ 
ated  with  pointers  to  regions  of  the  picture).  The 
model  also  contains  knowledge  which  is  not  in  the 
form  of  a  synthesized  description  but  which,  for 
instance,  could  be  used  in  generating  a  descrip¬ 
tion.  An  example  of  this  kind  of  knowledge  is: 

"ships  are  about  6  times  as  long 

as  they  are  wide." 

This  knowledge  can  be  included  in  the  model  net¬ 
work  and  may  help  a  program  that  is  searching  for 
a  ship,  but  will  not  become  part  of  the  model- 
image  mapping.  When  it  is  meaningful  to  differen¬ 
tiate  between  these  two  kinds  of  knowledge,  we  will 
refer  to  the  parts  of  the  network  that  represent 
the  former  kind  as  template  nodes.  Synthesized 
descriptions  will  be  directly  related  to  these 
nodes  and  their  arcs. 

Each  template  node  has  a  substructure  which 
represents  the  sense  in  which  that  node  is  to  be 
"understood."  Prior  to  a  query,  the  meaning  of  a~ 
node  is  defined  in  terms  of  a  substructure  of 
mapping  procedures  and  constraint  relations.  After 
a  query  has  been  satisfied,  the  template  node  is 
represented  by  instantiated  nodes  with  attached 
specialized  location  descriptions,  as  shown  in 
Figure  2.1.  Four  basic  types  of  links  provide  a 
simple  syntax  to  the  network  structure.  A  power¬ 
ful  advantage  to  this  syntax  is  that  the  executive 
procedure  can  direct  the  analysis  in  a  more 
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general  way,  by  using  programs  that  function  on 
classas  of  links  representing  different  kinds  of 
concepts,  rather  than  on  some  set  of  specific 
links.  The  need  for  this  organization  in  natural 
language  understanding  has  been  recognized  by 
Brachaan  [1976]. 


Figure  2.1  The  Next  Level  of  Detail  in  Node* 
Sketchmap  Nodes 


2.2  Constraints 

Links  to  other  model  nodes  encode  (perhaps 
parametrized)  constraint  relations  between  model 
nodes.  Links  can  encode: 

-  the  probability  that  the  relationship 
holds; 

-  a  quantifier  representing  the  expected 
value  of  the  relationship. 
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Fop  example,  the  relationship  SHIP  ADJACENT  DOCK 
night  have  a  certain  probability  of  being  true,  and 
and  an  expected  distance  that  the  ship  is  from  the 
dock.  We  refer  to  the  template  nodes  and  geometric 
relations  between  them  as  the  constraint  network. " 
This  network  may  be  interpreted  front  two  viewpoints. 
First,  the  existence  of  crucial  constraint  rela¬ 
tions  nay  be  checked.  This  may  be  done  through  the 
matching  features  of  the  associative  structures  in 
SAIL.  Secondly,  if  particular  parameters  arising 
from  a  constraint  are  needed,  the  network  may  be 
evaluated  like  a  program  to  find  subsets  of  the 
model  or  the  image  that  satisfy  the  constraints. 

Its  results  take  account  of  partial  or  unspecified 
information,  and  it  may  be  updated  upon  receipt  of 
better  data  with  a  minimal  amount  of  work.  It  is 
much  like  ‘die  graph  of  variable  dependencies  in  AL 
[Feldman  et  al.,  1975].  In  brief,  each  node  has  a 
"Constraint  Operation,"  such  as  Intersection, 
Translation,  Onion,  or  indeed  any  function  of  up  to 
two  arguments;  it  has  two  operand  nodes;  a  father 
node;  a  status  that  may  be  "Up-To-Date"  or  "Out-Of- 
Date";  and  a  value  that  is  some  data  structure  such 
as  a  number,  a  list  of  linear  objects,  a  region, 
etc.  Additionally,  it  may  have  information  on  how 
difficult  the  node  is  to  evaluate,  as  a  function 
of  die  contents  of  its  operand  nodes.  This  last 
feature  allows  sons  cost /benefit  analysis  of 
evaluation  of  sections  of  constraint  networks. 


Vi 
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The  constraint  network  for  the  prose: 

"The  centroids  of  docked  ships  are  on 
lines  parallel  to  the  intersection  of 
coastlines  with  dock  areas  at  a  distance 
of  one-half  a  ship  width” 

is  shown  in  Figure  2.2. 


Figure  2.2  A  Portion  of  a  Constraint  Network 

(I  *  Instance,  LD*  Location  Descriptor) 


The  network  starts  out  with  data  (Aram  the 
model  or  from  previous  scene  analysis)  as  the ; 
values  of  the  tip  nodes,  but  no  values  at  non¬ 
terminal  nodes  and  all  nonterminals  marked  Out-Of- 
Date.  Data  at  a  tip  node  can  have  one  of  three 
statuses:  it  can  be  known  that  the  object  does  not 
exist  in  the  scene  (so  the  value  of  the  node  is 
the  null  set),  it  can  be  known  to  some  degree  of 
accuracy  where  objects  are  in  the  scene  (so  the 
value  of  the  node  is  a  subset  of  image  or  world 
points),  or  perhaps  nothing  is  known  (in  which 
case  the  object  could  be  anywhere,  and  the  value 
is  implicitly  the  universe  of  image  or  world 
points). 

When  the  constraint  network  is  evaluated  to 
determine  what  is  known  about  the  location  of  its 
object,  each  node  recursively  evaluates  its  Out- 
Of-Date  operand  nodes,  performs  its  operation,  and 
stores  the  result  in  its  value.  It  marks  its  status 
Up-To-Date.  Intersection  and  Union  work  properly 
with  the  definitions  of  partial  information  of  the 
last  paragraph.  When  new  (or  better)  information 
about  an  object  at  a  tip  of  the  network  comes  in, 
all  nodes  on  a  path  from  the  tip  to  the  root  are 
marked  Out-Of-Date.  Then  when  the  network  is  next 
evaluated,  (only)  the  necessary  partial  results 
are  re-computed.  In  keeping  with  our  philosophy, 
the  network  is  not  self -activating,  but  is  run  on 
explicit  user  command. 
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2. 3  Location  Descriptors 

A  location  descriptor  provides  information 
about  where  to  find  an  entity.  The  part  of  the 
location  descriptor  which  specifies  a  point  set 
enclosing  the  region  has  been  referred  to  as  a 
tolerance  region  [Bolles,  1975].  A  shape  location 
descriptor  might  have  the  structure  shown  in 
Figure  2.3. 


C  Shape Location Descriptor 

nodetype:  specialization  prototype 

instance -of :  a  LocationDescriptor 
locates:  OneOf  {(a  ShapeObject) , 

(a  Shape Feature ) } 
coordsystem:  a  CoordinateSystem 
centroid:  a  PointSet 

//allows  for  nfuzziness" 
orientation:  an  AngleRange 
//...ditto 

tol.  region:  a  PointSet  ] 

...similarly  for  Point,  Linear,  and 
ArealocationDescriptors 

C  CoordinateSystem 

nodetype :  abstract  prototype 

units:  a  LengthUnit Specification 

scale :  a  HumberRange 

//length  units  /  system  unit 
transforms:  SetOf  {((a  Coordinate  Transform) 

(a  Coordinate  System)), 


Figure  2.3  Example  of  a  Shape  Location  Descriptor 


This  organization  is  suggestive  of  a  frame- 
like  structure  [Minsky,  1975].  However,  not  all  the 
entries  need  exist;  just  the  syntax  is  necessary 
to  allow  the  entries  to  be  found.  In  practice  only 
the  properties  relevant  to  a  particular  query  will 
be  generated.  Such  partial  instantiations  are  easy 
in  the  SAIL  associative  structures  [Fexdman  and 
Rovner,  1969]. 
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Our  examples  of  geometrical  constraints,  loca¬ 
tions,  etc.,  are  two-dimensional.  There  is  a  large 
class  of  interesting  images  that  are  inherently 
two-dimensional  (EHTS  images,  light  or  transmission 
electron  microphotographs,  CAT  scanner  images, 
bio-ultrasound  images)  as  well  as  some  that  for 
some  purposes  may  be  treated  as  two-dimensional 
(aerial  reconnaisance  imagery,  medical  X-ray 
Imagery,  natural  scenes,  etc.).  Of  course  it  is 
often  helpful  to  know  about  3-D  when  processing 
natural  scenes  CGarvey,  1976],  and  it  has  been 
demonstrated  that  a  3-D  model  of  the  world  is 
necessary  to  accomplish  some  tasks  with  aerial  - 
mapping  photographs  taken  from  35,000  feet  [Barrow, 
1977].  Within  the  framework  of  the  system  described 
here,  3-D  world  coordinate  systems  would  be  linked 
through  camera-transformation  coordinate  trans¬ 
forms  to  image  coordinate  systems.  The  location 
descriptors  would  be  in  terms  of  the  relevant  co¬ 
ordinate  systems. 

There  are  many  advantages  to  having  a  stand¬ 
ard  representation  for  object  locations: 

a.  If  such  descriptions  are  data  types,  their 
computations  can  be  separated  from  the 
procedures  that  use  them.  If  they  can  be 
passed  as  arguments,  they  provide  a  certain 
"common  currency"  between  procedures,  thus 
simplifying  and  modularizing  the  proce¬ 
dures  that  use  them. 

b.  Location  descriptors  can  represent  approxi¬ 
mate  locations,  which  is  useful  for  queries 
unconcerned  with  exact  answers. 


c.  Constraints  between  locations  can  propa¬ 
gate  knowledge  throughout  the  model.  Loca¬ 
tion  descriptors  can  be  computed  from 
other  location  descriptors  via  relations, 
or  by  union  and  intersection  of  the 
described  point  sets.  A  system  which 
applied  linear  programming  techniques  to 
the  problem  of  locating  regions  through 
constraints  placed  on  their  boundaries  was 
developed  in  [Taylor,  1976]. 

*'  *  •  « 

d.  Use  of  locatlda,  descriptors  is  geared  to 
an  abandonment ’of  the  exhaustive  segmenta¬ 
tion  paradigm  wherein  every  region  must 
correspond  to  some  object.  Different 
location  descriptors  may  refer  to  disjoint 
point  sets  or  may  overlap  on  the  image. 


a.  Control 

3.1  General  Philosophy 

Generally  a  query  results  in  the  synthesis  of 
a  sketchmap  with  instance  nodes  whose  location 
descriptors  are  accurate  enough  for  the  purposes 
of  the  query.  A  query  might  also  result  in  further 
refinement  of  location  descriptors  of  the  extension 
of  an  existing  sketchmap  to  account  for  more  image 
structure.  A  query-directed  vision  system  should 
thus  be  able  to  use  relevant  information  (i.e., 
the  state  of  the  analysis)  generated  in  successive 
queries.  Most  queries  will  take  the  form  of  user- 
written  executive  programs,  since  nontrivial  tasks 
usually  require  fairly  rigid  recommendations  about 
how  the  system  should  go  about  solving  them. 
Initially  the  system  will  not  attack  the  problem 
of  automatically  translating  queries  in  some  com¬ 
mand  language  into  executive  programs. 


Figure  3.1  shows  the  SAIL  code  used  in  a  very 
simple  executive  procedure  for  selecting  mapping 
procedures  which  identify  instances  of  rib  nodes 
in  chest  radiograph  images.  Each  mapping  procedure 
has  pre-conditions,  including  an  associated 
accuracy  measure,  which  can  depend  on  its  neigh¬ 
bors,  as  well  as  a  cost  measure.  The  cheapest  rib 
procedure  which  satisfies  the  pre-conditions  is 
selected.  Each  rib  node  is  searched  for  once  and 
there  is  no  facility  for  dealing  with  failures  or 
mistakes.  But  the  important  point  here  is  that 
the  executive  can  have  a  relatively  simple  struc¬ 
ture.  This  facilitates  experimentation  with 
various  control  strategies  other  than  the  depth- 
first  strategy  shown  in  the  example. 


3.2  Characterizations  of  Mapping  Procedures 

Mapping  procedures  have  associated  descrip¬ 
tions  which  are  used  by  executive  procedures.  The 
descriptions  contain  the  following: 

-  the  slots  in  the  data  object  which 
must  be  filled  for  the  procedure 
to  run; 

-  the  slots  the  procedure  can  fill  in; 

-  the  cost  and  accuracy  of  the  proce¬ 
dure  in  some  meaningful  units; 

-  the  a  priori  reliability  of  the 
procedure. 

Some  rib  mapping  procedure  descriptions  are  shown 
in  Table  4.1,  but  these  do  not  tax  our  representa¬ 
tion  scheme.  More  difficult  examples  of  the  kinds 
of  facts  we  expect  to  be  able  to  encode  in  this 
structure  are  (for  a  straight-line  structure)  that 
a  Hough  transform  [Duda  and  Hart,  1972]  cannot 
find  the  endpoints  of  a  line  but  is  more  reliable 
than  the  cheaper  Shirai  tracker  [Shirai,  1975], 
which  itself  needs  to  know  the  direction  of  a  line 
before  it  can  track  it,  and  that  a  Heuckel  operator 
[Heuckel,  1971]  is  more  expensive,  but  can  furnish 
many  facts  about  the  line  with  little  known 
a  priori,  and  can  rate  itself  on  reliability  of 
its  result. 


There  are  several  advantages  to  separating 
the  executive  procedure  from  the  mapping  proce¬ 
dures  and  their  descriptions: 

a.  The  executive  procedure  can  be  written 
more  easily  without  considering  the  imple¬ 
mentation.  details  of  mapping  procedures  in 
great  depth. 

b.  Mapping  procedures  are  similarly  simpli¬ 
fied  without  the  burden  of  determining  an 
appropriate  context  for  their  application 
[Sloman,  19773. 

e.  The  executive  procedure  can  automatically 
select  alternative  procedures  in  the  event 
of  mapping  procedure  failures. 

d.  Descriptions  allow  a  choice  between  methods 
(if  several  are  available)  based  on  capa¬ 
bility  t  resource  requirements,  and  a  priori 
reliability.  (Also,  recovery  from  failure 
of  individual  routines  can  be  automated 
through  planning  C Feldman  and  Sproull, 
19753.) 

e.  If  the  mapping  procedures  can  produce 
reliable  a  priori  estimates  of  their  success 
the  analytical  results  of  [Bolles,  19753 
and  CTaylor,  19763  could  be  extended  to 
select  the  procedure  which  produces  suffi¬ 
ciently  exact  data  objects. 


Recursive  Procedure  Mat chRib ( it eravar  Node); 


begin 

itemvar  x,  v;  integer  Var; 
if  INSTANCE  of  Node  is  ANY 
then 

begin 

Print ("rib  ",  Node,  "  already  matched"); 
return; 
end  '*■ 

■*& 

else  ~ 

1  find  and  run  procedure  to  do  job  at  min  cost; 
begin 

itemvar  TempProc;  integer  NinCost,  TempCost; 
NinCost  :*  VeryLarge; 
foreach  x  such  that 

RIB! PROCEDURE  of  Node  is  x  do 

begin 

Var  :*  Get Constraint sAndVar lance (Node ,  x) 

if  Var  <  Tolerance 

then  ..j. 

begin 

TempCost  :»  Find Cost ( Node ,  x); 

if  TempCost  <  NinCost 

then 

begin 

TempProc  :*  x; 

NinCost  :*  TempCost; 
end; 


end; 


if  NinCost  ■  VeryLarge 
then 

Print  ("No  proc.  can  do  job  for  rib  ",  Node) 
else  ApplyProc ( TempProc ,  Node); 
foreach  v  such  that  NEIGHBOR  of  Node  is  v 
and  TYPE  of  v  is  RIB 
do  NatchRib(v); 


end; 


end; 


Figure  3.1  Executive  Procedure  for  Ribs 
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4.  Applications 


4.1  Finding  Docked  Ships 

Finding  ships  in  a  dock  scans  illustrates  how 
high-level  mtrical  knowledge  about  the  image  (such 
as  provided  by  a  topographic  map)  can  make  certain 
scene  analysis  problems  easy. 

The  model  contains,  in  a  Constraint  Graph 
form  (see  Section-  ? . 2 )  f  the  knowledge  that  docked 
ships  are  in  the  o^ean  adjacent  to  dock  areas, 
parallel  to  the  dock  and  with  a  centroid  a  dis¬ 
tance  away  related  to  the  width  of  the  ship.  In  a 
Shape  Object  Descriptor,  some  facts  about  the 
sorts  of  ships  we  are  trying  to  find  are  stored, 
vis. ,  a  template  for  matching  them  (in  our  case,  a 
rectangle  of  l’s  in  an  array  for  template- 
matching),  their  width,  length,  average  bright¬ 
ness,  etc.  Template-matching  is  among  the  simplest 
vision  primitives.  Only  in  a  context  having  a  great 
deal  of  structure  could  it  be  expected  to  work  in 
scenes  as  complex  as  Figure  4.1a. 

Figure  4.1a  is  from  a  USGS  mapping  photograph. 
It  roughly  corresponds  to  the  topographic  map  of 
Figure  4.1b.  Included  in  the  map  are  such  linear 
features  as  coastlines  and  dock  areas.  From  the 
digitized  photo,  a  small  (196  x  164)  window  is 
extracted  and  stored  on  disk.  A  half-toned  version 
of  this  window  is  shown  in  Figure  4.1c.  From  the 
map,  the  coastline  and  a  dock  area  are  extracted 
and  stored  on  disk;  this  information  is  shown  in 
Figure  4. id.  Map  information  may  be  automatically 
registered  with  photographic  images  to  high 
accuracies  by  techniques  developed  at  SRI  [Barrow 
et  al. ,  1977].  For  our  study  the  registration  was 
performed  manually. 

The  system,  under  direction  of  the  user- 
written  query,  begins  by  deciding  where  to  look  by 
satisfying  a  constraint  network;  the  more  informa¬ 
tion  provided,  the  narrower  the  focus  of  attention. 
In  the  case  illustrated  in  this  section,  the  con¬ 
straint  network  looks  as  it  does -in  Section  2.2. 
Presupposition  of  "perfect"  registration  leads  to 
sharp  lines  of  search  specifying  loci  of  ship 
centers.  Imperfect  registration  would  give  fuzzier 
loci. 


The  linear  loci  and  the  orientational  con¬ 
straint  on  the  ships  naans  a  simple  template- 
matching  technique  will  suffice  to  do  the  ship¬ 
finding  job  efficiently.  (In  this  exercise  it  was 
the  only  technique,  but  an  executive  procedure 
might  well  have  chosen  it  as  applicable.)  The  ship 
template  is  rotated  to  be  parallel  to  the  midline 
as  given  by  the  constraint  graph,  and  template- 
matching  is  done  along  the  line;  note  is  taken  of 
where  the  score  for  the  match  goea  over  threshold, 
and  when  it  comes  back  down  under  threshold.  The 
average  of  these .two  positions  is  taken  as  the  lo¬ 
cation  of  a  ship.  The*”M£ck  squares  in  Figure  t.ld 
show  the  results.  **■ 

Our  USGS  mapping  photograph  is  digitized  to 
256  grey  levels  on  a  .007"  grid.  The  image  is 
stared  on  disk  with  comprehensive  and  expandable 
header  information.  The  image  may  be  windowed  and 
sampled  at  integral  size  reductions  into  an  inte¬ 
ger  array  in  core  for  processing. 

The  system  has  representations  for  linear 
objects  and  regions.  Linear  objects  are  SAIL 
records  making  linked  lists  of  (x,y)  points.  They 
can  have  three  types  at  present:  a  list  of  points 
to  be  connected  in  order;  a  list  of  segments,  i.e., 
pairs  of  endpoints  to  be  connected  pairwise;  and 
logically  circular  lists  of  points  representing 
boundaries.  A  robust  and  general  routine  based  on 
merging  was  written  to  compute  the  intersection 
of  such  linear  features.  Other  useful  geometric 
routines  find  the  distance  of  a  point  from  a  seg¬ 
ment  (not  a  line),  and  compute  a  segment  parallel 
to  and  some  distance  from  another  segment. 


Regions  (except  for  templates ,  which  are 
arrays)  are  SAIL  list  items.  A  region  is  a  list  of 
y-lista;  a  y-liat  has  a  y-value  followed  by  an 
even  number  of  x-values.  The  first  x-value  is  an 
"entering  region"  boundary  point,  the  second  is  a 
"leaving  region"  boundary  point,  and  so  on  alter¬ 
nately.  The  region: 


001 

101 

Oil 


would  be  represented  as 

((123)(21133)(333)). 


, «  «•»*.  «*. 


V  _j«.V  ,N  .S  AA* 


Routines  were  written,  again  baaed  on  merging,  to 
create  tbe  union  and  intersection  of  such  regions, 
and  to  convert  (via  an  aeyametric  DDA  algorithm 
[Neuman  and  Sproull,  1973])  linear  objects  to 
regions.  We  find  multiple  representations  of  ob¬ 
jects  simplifies  the  work  of  routines  such  as  the 
constraint  primitives. 

Template-matchipg.  utilities  can  produce  an 
array  containing  a  rotated  and  scaled  version  of  a 
template  and  can  compute  the  correlation  of  a  tem¬ 
plate  (at  some  rotation  and  translation)  with  the 
image  array. 


4-2  Finding  Riba  in  Chest  Radiographs 

The  problem  of  finding  ribs  in  chest  radio¬ 
graphs  illustrates  the  use  of  multiple  procedures 
attached  to  the  same  template  node  (cf.  Section 
3.1).  It  uses  the  less  precise  geometric  con¬ 
straints  arising  from  anatomy  rather  than  carto¬ 
graphy. 

The  model  contains  nine  right  and  left  ribs 
(the  maximum  amount  normally  visible  on  a  chest 
film).  Presently  only  the  lower  edge  of  each  rib 
is  detected.  Each  rib  is  modelled  as  a  template 
node  with  offset  parameters  from  itself  to  each 
immediate  neighbor  (above,  below,  opposite).  Addi¬ 
tionally,  three  different  mapping  procedures  are 
attached  to  each  rib  node  as  shown  in  Table  4.1. 

LookForARlb  uses  the  Wechsler  parabolic  model 
CWechsler  and  Sklansky,  1975]  td  find  a  rib  seg¬ 
ment.  AffirmARib  translates  that  segment  using  the 
offset  parameters  and  attempts  to  verify  the  pres¬ 
ence  of  a  rib  by  a  correlation  technique .  Halluci- 
nateARib  instantiates  a  rib  by  translating  a 
neighbor  with  no  verification. 


Table  4.1  RibFinding  Happing  Procedures 


Procedure 

Precondi¬ 

tions 

Cost 

Var. 

Postcondi¬ 

tions 

LookForARib 

none 

20 

0 

instance 
of  rib 

AffirmARib 

instance 
of  neigh¬ 
bor  in 
t sketchmap 

4 

1 

instance 
of  rib 

HallucinateARib 

instance 
of  neigh¬ 
bor  in 
sketchmap 

1 

S 

instance 
of  rib 

: 

Figure  4.2  shows  a  trace  of  the  display 
during  the  rib-finding  process.  Far  this  trace  a 
slightly  more  complex  executive  than  the  one 
shown  in  Figure  3.1  was  used.  If  a  napping  proce¬ 
dure  failed,  another  was  chosen  from  the  remain¬ 
ing  applicable  set.  In  Figure  4.2a  large  rec¬ 
tangles  enclosing  the  lung  fields  have  been  found 
(by  a  lung  query  executive)  and  the  smaller 
rectangles  are  plans  for  LookForARib,  which  is 
the  only  mapping  procedure  that  can  be  applied. 

The  horlsontally-ariented  rectangle  defines  an 
area  to  look  for  rib  edgea  for  the  model  node 
RIGHTRIB4  and  the  vertically-oriented  rectangle 
defines  an  area  for  foci  of  a  parabola  represent¬ 
ing  the  rib  border.  Figure  4.2b  shows  the 
resultant  rib  found.  Tigure  4.2c  shows  the  plan 
derived  from  the  constraints  for  the  opposite 
rib,  LEFTRIB4.  Hote  that  the  plan  now  has  the 
shape  of  RIGHTRIB4.  Figure  4. 2d  shows  the  instan¬ 
tiation  of  LEFTRIB4  found  by  AffirmARib.  Figure 
4.2e  shows  the  next  two  ribs  found  and  Figure  4.2f 
shows  the  entire  set  of  ribs.  The  ribs  marked  with 
the  box  (□)  are  found  by  HallucinateARib,  due  to 
the  failure  of  AffirmARib.  AffirmARib  fails  when 
the  edge  data  is  extremely  poor. 

To  appreciate  that  the  ribs  found  by  the  rib 
executive  actually  match  the  edge  data,  compare 
Figure  4.3a  with  4.3b,  which  shows  the  results 
from  another  chest  radiograph.  Figure  4.3a  shows 
the  principal  edges  in  the  image  and  the  latter 
has  the  ribs  overlaid  on  top  of  those  edges. 


The  semantic  network  is  a  kind  of  limped 
parameter  model  in  the  spirit  of  [Fischler  and 
Eschlager,  1973].  The  geometric  constraints  in  the 
network  relate  template  nodes  whose  descriptions 
(the  "lumped  parameters';)  are  generated  by  attached 
mapping  procedures.'  Thefkey  difference  is  that  in¬ 
formation  found  during  the  analysis  can  change  the 
way  template  nodes  are  located. 

In  analysing  an  image  it  is  crucial  that  the 
generating  of  abstract  descriptions  of  parts  of 
the  image  (i.e. segmentation)  be  intimately 
connected  with  the  interpretation  of  those  parts. 

In  our  system  the  former  operation  corresponds  to 
generating  sketcfamap-lmage  links  whereas  the  latter 
corresponds  to  generating  model-sketchmap  links. 
Interpretation  and  segmentation  are  united  through 
multiple  mapping  procedures  and  the  executive *~ 
which  can  efficiently  change  the  way  a  part  of  the 
image  is  analyzed  as  new  information  about  the 
rest  of  the  image  develops. 

Finally,  we  want  the  image  analysis  process 
to  do  as  little  work  as  possible  to  satisfy  a 
given  task  or  query.  This  is  attempted  through  the 
specialization  of  all  parameters  to  the  given  task, 
the  Inclusion  of  performance  and  accuracy  measures 
in  tiie  napping  procedure  descriptions,  and  the  use 
of  the  constraint  network.  All  of  this  is  just  the 
beginning  of  a  long-term  effort  to  study  what  can 
be  done  in  a  general  way  for  goal-directed  image 
understanding  tasks. 


Figure  4. Id  Coastline,  Dock  Area,  Loci  of  Possible 
Ship  Centers,  Points  of  Application  of 
Ship  Template  and  Location  of  Ships. 


Figure  4.2a  Lung  Boundaries  and  Plan  for  RIGHTRIB4 


Figure  4.2b  RXGHTRIB4  Found 
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